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PURPOSE 


This  report  describes  the  progress  of  two  projects. 
The  first,  entitled  the  Machine  Language  Translation  Study,  has 
been  in  progress  since  May  1959  under  sponsorship  of  U.S.  Army 
Signal  Research  and  Development  Laboratory.  It  has  a  long** 
range  but  primarily  practical  purpose:  to  implement  automatic 
translation  of  languages  by  means  of  a  large-scale,  generalized 
computer  system.  The  second,  entitled  the  Development  of  a 
Linguistic  Computer  System,  was  initiated  in  September  1961 
through  a  grant  from  the  National  Science  Foundation.  It  has 
a  broader  purpose:  to  support  basic  research  in  linguistics 
by  means  of  a  computer  system  with  generalized  capabilities 
for  language  data  processing  and  linguistic  information  pro¬ 
cessing.  Those  programs  and  portions  of  the  work  which  pri¬ 
marily  concern  the  development  of  a  machine  translation  system 
are  supported  by  the  United  States  Army.  The  work  directed 
primarily  toward  the  linguistic  computer  system  is  supported 
by  the  National  Science  Foundation.  These  two  projects  com¬ 
plement  each  other  in  that  programs  prepared  for  each  are 
applicable  to,  and  needed  in,  the  research  of  the  other. 
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ABSTRACT 


Progress  is  reported  toward  the  implementation  of  a 
computer  system  to  support  mechanical  translation  and  other 
linguistic  processes  which  are  potentially  applicable  to  scien¬ 
tific  documentation.  The  system  contains  three  sections:  one 
for  control,  a  second  for  language  data  processing,  and  a  third 
for  linguistic  information  processing.  The  first  is  now  opera¬ 
tional,  and  the  second  will  be  so  in  the  next  quarter.  The  third 
is  now  being  developed  at  the  level  of  syntactic  analysis. 
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PUBLICATIONS,  LECTURES,  REPORTS  AND  CONFERENCES 


Dr.  N.  P.  Lehaann  vifitad  the  aachine  tranalatlon  re¬ 
search  group  at  the  University  of  California,  Berkeley,  California, 
on  1-2  August.  He  consulted  with  Dr.  S.  Laab  and  other  aembers  of 
his  group  about  their  work  in  Russian  and  Chinese. 

On  1-3  August  Mr.  E.  Pandergraft  visited  the  Centre  de 
Traiteaent  de  1 ' Inforaation  Scientifique  (CETIS)  at  the  EURATOM 
Centre  Coaaun  de  Recherche,  Ispra,  Italy.  He  conferred  with 
Dr.  Y.  Lecerf  about  the  work  in  aechanical  translation  under  his 
direction,  and  with  Dr.  D.  G.  Hays  who  has  taken  a  year's  leave 
of  absence  froa  the  RAND  Corporation  to  pursue  research  in  seaan- 
tics  as  a  guest  at  Ispra.  Mr.  Pandergraft  talked  at  length  with 
Or.  N.  Detaat  ceaeeraiag  aark  ea  iaferaatiea  retrieval  in  progresa 
at  the  center,  and  with  Dr.  J.  Verheyden  about  his  work  on  syntax. 
On  2  August  he  presented  the  current  work  of  the  Linguistics  Re¬ 
search  Center  to  aeabers  of  the  EURATOM  aechanical  translation 
group. 

Mr.  Pandergraft  went  to  the  Centro  di  Cibernetica  e  di 
Attiviti  Linguistiche  at  the  University  of  Milan,  Italy,  on  4-7 
August.  He  conferred  with  Prof.  S.  Ceecato,  who  directs  research 
on  autoaatic  language  translation  at  the  center,  and  discussed  the 
project's  approach  to  seaantics  with  Mr.  E.  Glasersfeld.  Through 
the  assistance  of  Mr.  E.  Marietti,  he  bacaae  faailiar  with  soae 
of  the  prograaaing  techniques  being  used  at  the  center. 
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During  the  week  of  27-31  August  the  Ninth  International 
Congress  of  Linguists  was  held  jointly  by  M.I.T.  and  Harvard  Uni¬ 
versity  in  Cambridge,  Mass.  Members  of  the  Linguistics  Research 
Center  in  attendance  were  Drs.  W.  P.  Lehmann,  N.  Tosh,  S.  N.  Werbow, 
W.  Winter  and  Miss  A.  Yue.  Papers  were  presented  to  sections  of 
the  congress  by  Dr.  Lehmann  ("Types  of  Sound  Change",  Section  on 
Linguistic  Change),  Dr.  Tosh  ("Content  Recognition  and  the  Pro¬ 
duction  of  Synonymous  Expressions",  Section  on  Applications  of 
Computers),  and  Dr.  Winter  ("Styles  as  Dialects”,  Section  on 
Stylistics).  All  papers  will  be  published  under  the  title 
Proceedings  of  the  Ninth  International  Congress  of  Linguistics . 

An  opportunity  for  numerous  conversations  with  linguists  inter¬ 
ested  in  different  aspects  of  machine  translation  research  was 
afforded  by  the  congress.  During  the  week  Dr.  Tosh  and  Dr.  Werbow 
discussed  the  status  of  linguistic  work  under  Contract  Grant  NSF 
6-19277  with  Nr.  R.  See  of  the  National  Science  Foundation. 

On  24  September  Dr.  W.  P.  Lehmann  and  Dr.  Tosh  discussed 
our  linguistic  work  briefly  with  Mr.  E.  Companys,  who  is  Charge 
d'^tudes  with  the  Bureau  d'itudes  at  de  Liaison  pour  I'Enseigne- 
ment  du  Fran9ais  dans  le  monde,  Paris. 

The  Linguistics  Research  Center  was  visited  on  9-11 
October  by  a  group  of  specialists  from  Germany  who  are  making  a 
survey  of  mechanical  translation  research  in  this  country  as  a 
basis  for  recommendations  to  their  government  concerning  the 
initiation  of  a  program  of  its  own.  Mr.  R.  F.  Krollman,  Director 
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of  the  Ubersetzerdienst  der  Bundeswehr,  the  translating  bureau  of 
the  German  Armed  Forces  at  Mannheim,  was  a  member  of  the  group. 
This  agency  employs  some  300  professional  translators  and  editors 
to  translate  all  technical  literature  required  by  the  Bundeswehr. 
Mr.  D.  Himburg  represented  the  Military  Research  and  Development 
Division  of  the  German  Ministry  of  Defense;  he  is  responsible  for 
sponsorship  of  basic  and  applied  research  in  data  processing,  com¬ 
putation,  and  allied  fields.  Dr.  H.  Schnelle,  Associate  Professor 
at  the  University  of  Bonn,  was  also  present.  Because  of  his  inter 
ests  in  cybernetics,  information  storage  and  retrieval,  and  mathe¬ 
matical  linguistics.  Dr.  Schnelle  holds  a  German  defense  research 
contract  on  preliminary  research  that  will  lead  to  mechanical 
translation.  Mr.  G.  Beyer,  Chief  Interpreter  on  the  staff  of  the 
German  Military  Representative  to  MC/NATO  in  Nashington,  D.  C., 
represented  the  interests  of  the  foreign  language  service  of  the 
German  Federal  Ministry  of  Defense  in  the  United  States.  On 
10  October  Mr.  Pendergraft  presented  our  work  in  theoretical 
linguistics  to  the  group.  An  explanation  of  our  work  in  descrip¬ 
tive  linguistics  was  given  on  the  following  day  by  Dr.  Tosh. 

At  that  time  Dr.  Schnelle  also  outlined  his  own  approach  to  mechan 
leal  translation.  In  conferences  with  Dr.  W.  P.  Lehmann  it  was 
decided  that  Miss  E.  Hoffmann,  a  member* of  Mr.  Krollman's  staff, 
would  join  our  descriptive  linguistics  group  for  six  months  to 
familiarize  herself  with  our  methods.  Her  work  will  be  supported 
by  the  German  government. 
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Mr.  E.  Pendergraft  is  conducting  an  advanced  seminar 
on  mechanical  translation  within  the  Linguistics  Program  of  The 
University  of  Texas  during  the  fall  semester. 
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RESEARCH  SUMMARY 


The  research  under  these  contracts  is  currently  being 
conducted  by  three  separate  research  groups  whose  responsi¬ 
bilities  are  as  follows: 

A  theoretical  linguistics  group,  consisting  chiefly  of 
nathenaticians,  has  responsibility  for  development  of  the  hy¬ 
pothesis  underlying  the  work  of  the  other  two  groups.  An  early 
result  of  the  study  was  the  conclusion  that  current  theories  of 
linguistic  structure  were  not  explicit  enough  to  support  appli¬ 
cations  in  the  field  of  mechanical  translation.  The  first  phase 
of  the  research  was  therefore  concerned  with  attempts  to  expli¬ 
cate  existant  linguistic  theories  by  means  of  formalization. 

This  work  has  progressed  to  include  formalised  hypotheses  for 
the  syntactic  and  semantic  relations  which  must  be  taken  into 
account  in  translation.  A  general  theory  of  translation  may  be 
based  on  this  foundation;  it  has  been  completed  in  all  important 
details.  Present  theoretical  studies  are  designed  to  extend  and 
further  explore  implications  of  the  translation  theory. 

A  systems  group  has  responsibility  for  development  of 
a  generalized  computer  system  based  on  the  above  theory,  and  for 
its  operation  in  the  performance  of  linguistic  research  on  various 
languages.  All  essential  features  of  the  System  have  now  been 
specified  with  sufficient  precision  for  programming.  It  contains 
three  main  sections:  one  for  control,  a  second  for  language  data 
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processing,  and  a  third  for  linguistic  information  processing, 
including  three  types  of  translation  [1].  The  first  section  is 
now  operational,  and  the  second  will  be  so  in  the  next  quarter. 

The  third  section,  which  depends  upon  the  second  for  its  data, 
is  at  present  under  development  a^  the  level  of  syntactic  analy¬ 
sis. 

A  descriptive  linguistics  group  is  engaged  in  testing 
the  hypothesis  by  applying  it  to  the  German  and  English  languages. 
Syntactic  and  semantic  studies  have  been  in  progress  for  approx¬ 
imately  three  years;  at  present  they  are  oriented  to  a  specific 
corpus  taken  from  Edward  Ruechardt's  book,  Sichtbares  und 
Unsichtbares  Licht,  and  its  English  translation  [2].  German  and 
English  dictionaries  have  also  been  based  upon  Oer  Sprach 
Brockhaus  and  Webster’s  New  Collegiate  Dictionary,  respectively. 
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5  PROGRESS  IN  THE  QUARTER 

Research  in  the  quarter  Included  the  following  activi¬ 
ties  : 

S.l  Systeas  Prograaaing  and  Operations 

The  current  objective  in  prograaaing  reaains  as  stated 
in  the  previous  report:  to  finish  all  programs  needed  to  support 
lexical  and  syntactic  aonolingual  recognition  by  the  end  of  the 
yearo  Flow-charting  and  coding  for  this  purpose  have  been  coa- 
pleted,  and  all  of  the  necessary  prograas  are  now  being  tested 
under  careful  aonitoring  to  insure  that  this  goal  will  be  net. 

Although  program  testing  was  performed  primarily  at 
the  U.  S.  Army  Electronic  Proving  Ground  at  Fort  Hunehucn, 

Arizona,  toward  the  end  of  the  quarter  it  becane  advantageous 
to  nove  programs  to  the  CEIR  Computer  Center  in  Houston.  Better 
availability  of  computer  time,  together  with  greater  nobility  of 
the  programming  staff,  made  this  arrangement  more  efficient  for 
final  checkout  and  assembly  of  sub-programs. 

•  As  grammar  maintenance  approached  operational  status, 

the  operations  section  prepared  for  its  increased  responsibility 
by  reviewing  data  handling  and  accounting  procedures.  The  pro¬ 
cessing  of  graaaatical  data  began  near  the  end  of  the  quarter 
as  predicted,  but  these  first  uses  of  graaaar  aaintenance  programs 
are  considered  as  program  testing;  the  programs  have  not  as  yet 
been  released  for  standard  operational  use. 

The  following  progress  was  made  in  individual  prograas. 
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S.1.1 


Control  Program 


The  addition  of  now  functions  has  caused  the  Control 
Program  to  become  too  large  for  the  space  allotted  to  it  within 
the  System;  thus  it  became  necessary  to  store  infrequently  used 
segments  of  the  program  on  the  Program  Tape.  The  segments  will 
now  be  called  into  core  only  when  they  are  needed  during  System 
operation. 

Two  additional  imprcrvements  were  made  in  the  Control 
section  along  with  the  above  modification.  Previously  the  Program 
Tape  had  been  generated  on  the  central  computer  and  was  available 
only  during  the  processing  mode  of  System  operation  [3,  p.  3]. 

f' 

This  tape  may  now  be  created  by  an  off-line  card-to-tape  opera¬ 
tion  and  will  be  accessible  during  both  processing  and  checkout 
mode. 

Most  of  the  modification  was  completed  in  the  quarter; 
it  included  the  coding  and  testing  of  a  bootstrap  routine  to 
initiate  self-loading  of  the  Control  Program  from  the  Program 
Tape  and  a  routine  to  load  other  programs  from  the  Program  Tape. 

5.1.2  General  Sort 

Testing  of  Gene'ral  Sort  was  completed  early  in  the 
quarter.  The  program  was  immediately  incorporated  into  the  System, 
where  its  use  in  tests  of  other  programs  proved  it  to  be  accurate. 
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Experience  with  the  program  has  already  begun  to  indi¬ 
cate  data  situations  for  which  the  program  may  not  be  as  efficient 
as  specialized  sorts.  Every  heuristic  can  be  expected  to  be  inef¬ 
ficient  for  some  kinds  of  data,  however,  and  the  present  one  seems 
well  suited  to  our  needs.  Study  of  the  data  situations  which  evi¬ 
dence  slow  sorting  has  suggested  additional  heuristics  which  may 
improve  the  generality  of  the  program.  These  adjustments  have 
been  given  low  priority,  since  our  objective  at  this  phase  of  the 
study  is  not  complete  optimization  but  only  optimization  within 
the  limits  of  good  research  procedure.  General  sorts  in  the 
System  may  be  replaced  by  more  specialized  sorts  at  a  later  date 
when  higher  priority  programming  and  more  careful  studies  of 
sorting  times  have  been  completed. 

General  Sort  is  self-contained;  it  may  be  used  either 
as  a  subroutine  or  as  a  complete  program.  Its  data  must  be  con¬ 
tained  on  an  input  tape  with  one  sort  item  per  record.  General 
Sort  may  read  fixed  or  variable  length  records,  as  specified. 

Two  tapes  are  employed  in  the  sort;  the  original  input  tape  may 
be  saved,  in  which  case  three  tapes  are  required  for  sorting. 

Three  types  of  sort  are  available;  these  may  be  per¬ 
formed  individually  or  in  any  combination  for  a  given  set  of 
data.  Adaptation  of  General  Sort  to  a  specific  problem  is  achieved 
by  means  of  a  table,  each  entry  of  which  specifies  a  type  of  sort 
to  be  performed  and  the  conditions  for  sorting. 
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The  sorts  now  available  are: 

(1)  Sort  a  word  of  each  record  according  to  the  bit  mask 
supplied  with  this  entry.  The  word  to  be  sorted  is 
specified  in  this  entry. 

(2)  Sort  a  table  of  words  in  each  record  according  to  the 
bit  mask  supplied  with  this  entry.  The  record  word 
in  which  the  table  begins  and  the  record  word  con¬ 
taining  the  table  length  are  specified  in  this  entry. 

(3)  Sort  a  table  of  words  in  each  record  according  to  the 
bit  mask  supplied  with  this  entry.  The  record  word 
containing  the  beginning  location  of  the  table  and 
the  record  word  containing  the  table  length  are 
specified  in  this  entry. 

Additional  sort  types  nay  be  added  to  General  Sort  as 
desired:  the  above  three  have  been  found  to  be  adequate  for  our 
present  needs. 

5.1.3  Request  Maintenance 

A  relatively  simple  program  to  update,  sort  and  verify 
certain  features  of  Request  Tapes  has  been  found  advantageous 
as  an  aid  to  systems  operations.  The  program  nay  be  used  to 
correct  obvious  errors  that  are  discovered  when  data  are  being 
assembled  for  corpus,  grammar  or  transfer  maintenance  thus 
avoiding  more  costly  correction  procedures  in  the  System. 
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The  program  will  be  especially  useful  for  the  data 


of  new  languages,  since  these  may  be  assembled  on  Request  Tapes 
until  studies  have  progressed  sufficiently  to  merit  the  cost  of 
their  maintenance  within  the  System.  Our  Chinese  and  Russian 
studies,  for  example,  will  initially  have  this  status. 

Design,  flowcharting  and  coding  for  the  program  were 
completed:  it  should  be  operational  early  in  the  next  quarter. 

The  416,000  cards  containing  German  and  English  dictionaries 
will  then  be  preprocessed  by  the  program  before  these  data  are 
presented  to  the  grammar  maintenance  process. 

5.1.4  Corpus  Maintenance 

Final  testing  of  the  Corpus  Revision  and  Corpus  Display 
functions  was  completed,  and  the  programs  were  instated  on  the 
Program  Tape.  Previously,  corpus  maintenance  had  been  performed 
under  control  of  the  SOS  or  FAP  monitoring  systems. 

Coding  and  testing  were  completed  for  the  only 
remaining  corpus  maintenance  function.  Corpus  Selection,  which 
prepares  requested  samples  of  corpora  as  input  for  Lexical 
Recognition.  Corpus  Selection  should  be  operational  in  the 
next  quarter. 

5.1.5  Grammar  Maintenance 

About  half  of  the  quarter  was  spent  in  testing  indivi* 
dual  segments  of  Grammar  Revision.  The  ten  segments  of  the 
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program  ware  than  combined  and  checked  out  as  a  unit  with  small 
samples  of  data.  When  these  tests  were  completed,  more  com* 
prehensive  testing  was  performed  with  the  complete  German 
grammar  based  on  Ruechardt's  Sichtbares  und  Unsichtbares  Licht  [2]. 

Early  experiences  with  real  data  demonstrated  that 
a  sort  heuristic  in  the  second  segment  of  the  program  was 
unsatisfactory.  After  study  a  change  in  data  handling  pro¬ 
cedures  reduced  the  sorting  time  by  almost  one  half,  and  further 
reductions  are  anticipated  through  the  use  of  an  additional 
heuristic.  On  the  whole  the  programs  perform  well  and  should 
significantly  reduce  the  cost  of  compiling  and  maintaining  grammars. 

The  above  experimentation  with  Grammar  Revision  led 
to  the  conclusion  that  some  of  the  ten  segments  could  be  combined 
to  yield  a  more  compact  program.  As  soon  as  the  reassembled  six- 
segment  program  has  passed  final  tests,  it  will  be  used  at  Fort 
Huachuca  to  compile  the  German  and  English  dictionaries. 

Individual  segments  of  Probability  Revision  had  been 
almost  completely  tested  at  the  end  of  the  quarter.  The  pro¬ 
gram  should  be  assembled  and  tested  as  a  whole  early  in  the 
coming  quarter. 

Those  parts  of  Grammar  Display  needed  for  syntactic 
and  semantic  data  were  coded,  tested  and  made  available  in  the 
System.  The  results  of  Grammar  Revision  experiments  were  dis¬ 
played  with  the  program;  it  currently  provides  listings  of 
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syntactic  or  semantic  rules  in  two  different  sorts,  either 
or  both  of  which  may  be  requested: 

(1)  numerical  sort  on  form  number 

(2)  alphabetical  sort  on  all  elements  of  the  rule. 

Both  sorts  are  achieved  with  General  Sort,  using  the  first  and 
second  types  of  sorts  mentioned  above.  The  segments  which  will 
provide  displays  of  lexical  data  and  the  mnemonic  equivalents 
of  syntactic  or  semantic  variables  have  not  yet  been  coded. 

These  are  not  essential  to  current  goals  and  will  be  added  as 
time  permits. 

Testing  of  the  individual  segments  of  Input  Grammar 
Selection  was  completed.  The  segments  were  combined  and  check* 
out  begun  on  the  whole  program.  Although  the  program  was  not 
operational  at  the  end  of  the  quarter,  it  is  expected  to  be 
ready  soon  to  prepare  comprehensive  test  data  for  Monolingual 
Recognition. 

5.1.6  Transfer  Maintenance 

Transfer  maintenance  programming  continues  to  have 
low  priority  at  present.  The  only  activity  in  this  area  was 
completion  of  testing  for  the  first  of  three  segments  of  Inter¬ 
lingual  Transfer  Revision.  This  segment  was  then  combined  with 
the  second,  and  testing  of  the  pair  was  begun. 
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S.1.7 


Monolingual  Recognition 


Testing  of  Lexical  Monolingual  Analysis  and  Lexical 
Monolingual  Analysis  Choice  was  completed;  the  two  programs  will 
be  combined  early  in  the  next  quarter  so  that  their  coordination 
may  be  checked  with  more  complex  data. 

Test  data  were  prepared  and  preliminary  testing  was 
started  for  Syntactic  Monolingual  Analysis.  Considerable  pro¬ 
gress  was  made  in  debugging  toward  the  end  of  the  quarter.  The 
program  should  be  ready  for  checkout  with  more  complex  data  early 
in  the  next  quarter. 

Preliminary  testing  of  Syntactic  Monolingual  Analysis 
Choice  was  finished  early  in  the  quarter.  Since  the  analysis 
program  was  not  ready  for  the  services  of  this  routine,  more 
complex  data  were  prepared  to  test  the  choice  heuristic.  These 
experiments  were  still  in  progress  at  the  end  of  the  quarter. 

Lexical  and  Syntactic  Monolingual  Analysis  Display 
have  been  combined  into  a  single  program  structure  with  two 
segments.  The  first  segment  gathers  results  of  analysis  and 
processes  then  into  the  order  they  will  have  in  the  display; 
the  second  produces  the  display  itself.  Testing  of  the  indivi¬ 
dual  segments  is  proceeding  satisfactorily.  Increasingly  com¬ 
plex  data  are  being  given  to  the  first  segment,  which  is  the 
more  intricate  of  the  two.  The  second  segment  now  works  for 
most  test  cases,  and  should  soon  be  available  for  assembly  with 
the  first. 
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No  additional  work  was  done  on  Semantic  Monolingual 
Recognition  in  the  quarter. 

S.2  Descriptive  Linguistics 

Pilot  studies  for  Russian  and  Chinese  were  initiated 
during  the  quarter  because  of  the  increasing  availability  of 
structural  information  about  these  languages  from  other  groups. 
With  this  addition,  the  linguistics  group  was  organized  into 
research  sections  for  individual  languages,  and  was  renamed  in 
an  attempt  to  more  accurately  designate  its  responsibility. 
Progress  will  therefore  be  reported  by  language  section. 

S.2.1  English 

During  the  tenth  quarter  a  concerted  effort  was 
initiated  to  encode  the  basic  German  and  English  dictionaries 
for  the  translation  system  [4,  p.l7].  From  that  time  the  various 
parts  of  speech  were  encoded  just  as  found  in  the  dictionary 
references,  or  without  certain  specified  affixes.  Consequently 
any  complex  entries,  such  as  noun  compounds,  derived  adjectives, 
etc.,  were  encoded  as  whole  forms  or  as  forms  with  only  inflec- 
tional  affixes  removed. 

Dictionary  data  are  as  a  result  not  all  in  the  final 
form  which  would  be  desired  for  translation.  It  would  not  be 
economical,  for  instance,  to  maintain  noun  compounds  in  unit 
form.  We  have  accordingly  continued  with  an  analysis  of  the 
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internal  morphological  constituency  of  complex  entries,  as  was 
first  reported  in  the  thirteenth  quarter  [1,  p.40]. 

Internal  analysis  was  completed  in  the  quarter  for 
verbs.  Because  nouns  constitute  by  far  the  greatest  portion 
of  dictionary  data,  a  considerable  amount  of  work  remains  in 
this  area;  at  the  present  time,  internal  analysis  of  nouns  is 
approximately  one-third  complete. 

In  the  area  of  syntax,  a  complete  catalogue  was  made 
of  the  specific  tagmemic  patterns  of  sentences  of  Corpus  5  [2]. 

We  now  feel  reasonably  confident  in  our  understanding  of  internal 
structures  of  the  noun  and  adjective  phrases  and  of  the  adverbials, 
some  of  which  would  be  identified  internally  as  noun  or  pre¬ 
positional  phrases.  These  elements  were  cataloged  by  recording 
the  appropriate  sequences  of  variables  as  they  occur  in  clauses. 

All  other  elements,  such  as  connectives  or  phrases  which  cannot  as  yc 
be  classified,  were  recorded  in  the  clause  sequences  as  constants. 

The  first  sentence  of  Corpus  5,  for  example,  reads: 

There  is  a  beautiful  expression  for  the  arrival 
in  this  world  of  a  new  member  of  the  human  family: 
the  infant  "sees  the  light  of  day". 

The  clause  sequence  of  this  sentence  was  recorded  as  follows: 

OS  001 

there  VERB/SING  NP/SING  for  NP/SING  in  NP/SING  of 
NP/SING  of  NP/SING:  NP/SING  "sees  the  light  of  day". 

The  interrupting  quotation  marks  make  it  difficult  to  classify 

the  expression  "sees  the  light  of  day"  consistently  with  other 
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data  so  far  collected.  The  expression  is  entered  as  a  sequence 
of  constants  to  mark  it  for  later  analysis.  In  the  recorded 
data,  lower  case  letters  represent  constants.  The  remaining 
symbols  designate  syntactic  variables. 

Note  that  for  the  time  being  all  prepositions  were 
recorded  as  constants.  Furthermore,  we  did  not  try  to  classify 
clause  sequences  at  this  phase  of  the  analysis,  but  did  so  only 
after  all  clause  data  from  Corpus  5  had  been  assembled.  Clause 
rules  were  then  written  to  describe  the  structures  with  the 
greatest  possible  generality  within  the  limitations  of  the  data. 
This  technique  will  later  be  extended  to  include  additional 
corpora  by  means  of  automatic  syntactic  analysis. 

Structural  diagrams  were  also  prepared  for  all  1000 
sentences  of  Corpus  5,  except  for  a  small  residue  of  problematic 
occurrences  including  primarily  those  unique  or  near-unique 
structures  which  occur  only  once  or  a  few  times  in  the  corpus. 
Since  we  have  little  else  with  which  to  compare  these  problematic 
structures,  no  attempt  at  general  solutions  shall  be  made  without 
more  data.  A  record  is  kept  of  such  problems  so  that  they  nay 
be  reviewed  as  new  circumstances  dictate. 

Collection  of  synonymy  data  has  been  concentrated  into 
two  activities  [5].  The  first  is  a  study  of  synonymous  sub¬ 
stitutes  for  nominals  occurring  in  Corpus  5.  This  study  was 
done  from  the  point  of  view  of  translation;  the  data  consist 
of  English  nominals  corresponding  as  equivalents  to  German 
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nominals  in  the  original  corpus.  The  second  is  a  study  of 
synonymous  substitutes  in  general  within  the  same  corpus.  It 
is  not  limited  to  specific  syntactic  categories  like  nominals, 
but  may  include  as  synonymous  substitutes  expressions  of  any 
category  from  the  size  of  words  to  sentences,  and  without  con> 
sideration  of  any  language  except  English.  To  date,  synonymous 
nominal  expressions  have  been  recorded  for  occurrences  in  the 
first  408  sentences  of  Corpus  5;  the  more  general  substitutes 
have  been  recorded  for  616  sentences. 

S.2.2  German 

Additional  work  was  done  on  the  internal  morphological 
constituency  of  German  dictionary  entries  in  studies  roughly 
paralleling  those  already  described  for  English.  No  internal 
analysis  has  as  yet  been  done  on  German  verbs.  Adjective  ana¬ 
lysis  is  approximately  one-third  complete.  Priority  has  been 
given  to  nouns,  since  this  category  constitutes  the  bulk  of 
dictionary  data.  An  accurate  estimate  of  the  status  on  noun 
analysis  is  not  presently  available;  however,  a  working  paper 
explaining  details  of  German  noun  coding  and  the  subsequent 
morphological  analysis  is  in  preparation  [5].  It  will  be  made 
generally  available  to  interested  individuals  or  research  groups. 

Three  basic  approaches  to  the  organization  of  German 
noun  phrase  data  have  resulted  from  earlier  studies.  The 
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general  form  of  these  alternatives  is  shown  below;  constants 
functioning  as  inflectional  suffixes  are  represented  by  x,  ^ 
and  z : 


The  alternatives  will  be  evaluated  with  additional  corpus  data 
when  automatic  syntactic  analysis  becomes  available. 
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A  survey  of  clause  components*  similar  to  the  English 
study  described  above,  has  been  completed  for  the  German  version 
of  Corpus  S.  Structural  diagrams  of  all  sentences  in  the  corpus 
were  also  prepared. 

Compilation  of  German  synonymy  data  also  parallels  the 
methods  outlined  above.  Synonymous  nominal  expressions  have 
been  provided  for  all  occurrences  of  nominals  in  German  Corpus  5. 
Synonyms  for  adjectivals  and  other  pre>nominal  modifiers  are 
being  compiled  at  the  present  time. 

5.2.3  Russian 

Russian  studies  will  concentrate  first  upon  a 
structural  description  of  the  grosser  olemonts  of  sentences  in 
specific  corpora.  Three  articles  from  Voprosy  Ekonomiki. 
yielding  a  corpus  of  over  500  sentences,  were  selected  for  the 
pilot  study  [6].  The  description  was  limited  to  structural 
relations  obtaining  between  clauses  and  the  constituents  of 
individual  clauses.  The  analysis  of  all  500  sentences  was 
completed,  and  structural  diagrams  were  prepared  for  all 
occurrences  of  clause  structures.  The  description  is  cur¬ 
rently  being  reviewed  for  consistency. 

5.2.4  Chinese 

Chinese  studies  were  initiated  with  the  same  methodo¬ 
logy  as  the  Russian.  For  the  first  corpus  we  are  using  a  text 
on  language  teaching  which  consists  of  1357  sentences  [7]. 
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The  clause  analysis  and  all  structural  diagrams  for  the  corpus 
have  been  completed.  A  review  of  resultant  data  is  in  pro¬ 
gress. 


5.3  Theoretical  Linguistics 

The  responsibilities  of  the  mathematics  group  were 
also  broadened  to  include  library  research  and  the  development 
of  other  sources  of  information  by  which  we  may  support  our 
efforts  in  descriptive  linguistics.  Mathematical  research 
will  continue  within  this  wider  framework. 

The  preparation  of  publications  summarizing  research 
results  was  emphasized  in  the  quarter.  Two  new  studies  were 
also  begun:  one  pertaining  to  the  concept  of  entropy  in 
stochastic  formation  structures,  the  other  concerned  with  a 
model  containing  as  "sub-structures"  all  phases  of  the  trans¬ 
lation  process.  These  are  both  tentative  orientations  with 
a  longer-range  investigation  of  problems  of  optimization  in 
linguistic  description  and  translation.  Since  the  entropy 
study  has  already  reached  completion,  it  is  reported  in  some¬ 
what  greater  detail  below. 

5.3.1  Theory  of  Formation  Structures 

The  development  of  formation  structure  theory,  along 
with  its  interpretations  in  syntactics  and  semantics,  has 
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acquired  such  dimensions  that  the  publication  will  be  divided 
and  published  in  smaller  units.  The  first  such  working  paper, 
entitled  "Q-Collections  and  Concatenation",  was  completed,  and 
will  be  distributed  in  the  next  quarter.  The  paper  is  concerned 
with  basic  properties  of  sequences  of  symbols  occurring  in  the 
rules  of  formation  structures  [8,  appendix].  Interpretations 
of  the  theory  are  illustrated  by  numerous  linguistic  examples. 

The  second  paper  in  the  series,  concerned  with  the 
choice  of  axioms  for  formation  structures  and  derived  pro¬ 
perties,  is  essentially  complete,  although  theorems  were  added 
to  it  during  the  quarter.  It  is  anticipated  that  this  working 
paper  may  also  be  completed  in  the  next  quarter. 

5.3.2  Entropy  in  Stochastic  Grammars 

The  delimitation  of  various  concepts  of  entropy  in 
grammars  for  stochastic  formation  structures  was  undertaken 
in  order  to  explore  possible  relevance  to  techniques  of 
optimization.  An  investigation  was  first  directed  toward  a 
quality  evaluation  of  the  stochastic  grammar  in  terms  of 
probability  distributions  over  those  sets  of  rules  which  define 
the  membership  of  individual  syntactic  classes.  The  entropy 
computation  may  provide  a  basis  for  improving  the  grammar 
through  reclassification.  A  method  was  also  indicated  for  an 
entropy  computation  to  evaluate  the  grammar  as  a  whole.  These 
results  are  being  prepared  as  a  working  paper  for  general  dis¬ 
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CONCLUSIONS 


It  is  anticipated  that  all  programming  for  Corpus  Mainte 
nance,  Grammar  Maintenance,  and  Lexical  and  Syntactic  Monolingual 
Recognition  will  be  operational  in  the  next  quarter.  The  mainte¬ 
nance  programs  will  probably  be  ready  before  the  end  of  November, 
with  completion  of  the  recognition  programs  following  about  one 
month  later,  in  mid-December. 

These  programs  are  essentially  a  compiler  for  syntactic 
recognition  algorithms  and  for  the  data  upon  which  such  algorithms 
operate.  They  are  designed  primarily  for  purposes  of  experimenta¬ 
tion  with  large  stores  of  language  data,  though  the  principles 
upon  which  they  are  based  may  readily  be  adapted  to  a  broader 
range  of  applications.  This  step  has  not  been  taken  because  it 
would  be  premature  and  wasteful.  At  present  the  most  urgent 
need  in  translation  research  is  more  accurate  knowledge  of  syntax. 

We  believe,  therefore,  that  the  capability  of  these 
programs  will  represent  an  important  milestone  in  this  project, 
and  perhaps  in  the  field  as  a  whole,  since  it  will  appreciably 
shorten  the  time-scale  for  experimentation  with  syntactic  analysis 
and  in  consequence  for  development  of  applications.  Our  studies 
of  Russian  and  Chinese  have  been  undertaken  to  make  better  use  of 
this  capability,  and  to  give  it  a  broader  test. 
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PLANNING  FOR  THE  NEXT  QUARTER 


The  systems  programming  section  will  concentrate  upon 
completion  of  programs  supporting  syntactic  recognition,  and 
will  then  undertake  the  necessary  indoctrination  to  make  them 
available  to  the  operations  section.  This  will  include  pre¬ 
paration  of  necessary  documentation;  in  particular,  the  users' 
manuals  and  appropriate  flow  diagrams  will  be  developed  for 
publication.  As  already  mentioned,  some  features  of  the  pro¬ 
grams  which  do  not  interfere  with  immediate  use  v;ill  be  added 
at  convenient  times  during  future  programming  schedules.  This 
procedure  is  more  efficient  than  rigidly  defined  scheduling; 
it  will  also  be  employed  for  those  inevitable  adjustments  in 
programs  which  result  from  their  first  large-scale  use. 

As  German  and  English  data  become  available  through 
the  compilation  of  grammars,  the  descriptive  linguistics  group 
will  give  priority  to  error  detection  and  correction  in  pre¬ 
paration  for  the  first  grammar  revision  cycle.  For  the  time 
being,  two  grammars  of  each  language  will  be  maintained  in  the 
System:  a  smaller  corpus-oriented  one  for  experimentation,  and 
one  for  comprehensive  description  of  the  language.  Efforts 
will  be  made  in  the  next  quarter  to  clear  up  residual  problems 
in  German  and  English  clause  structures. 
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Studies  of  Russian  and  Chinese  syntax  will  be  continued 
in  their  present  status:  resultant  data  will  be  accumulated  on 
Request  tapes,  but  will  not  bo  maintained  in  the  Language  Data 
Processing  Section. 

Work  in  theoretical  linguistics  will  be  a  continuation 
of  the  activities  outlined  above,  with  emphasis  in  the  coning 
quarter  again  upon  completion  and  distribution  of  research  pub¬ 
lications  now  in  preparation.  Planning  will  begin  on  specific 
goals  for  library  research,  and  for  maintenance  of  a  glossary 
of  the  technical  terms  being  used  in  our  publications. 
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Griffiss  Air  Force  Base,  New  York 

AFSC  Scientific/Technical  Liaison  Office 
U.  S.  Naval  Air  Development  Center 
Johnsville,  Pennsylvania 

Commanding  Officer 

U.  S.  Army  Electronics  Materiel  Support  Agency 

ATTN:  SELMS-ADJ 

Fort  Monmouth,  New  Jersey 
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Director,  Fort  Monmouth  Office 
USA  Communication  and  Electronics 
Fort  Monmouth,  New  Jersey 

Corps  of  Engineers  Liaison  Office 
U.  S.  Army  Electronics  Research  ^ 
Fort  Monmouth,  New  Jersey 

Marine  Corps  Liaison  Office 
U.  S.  Army  Electronics  Research  S 
Fort  Monmouth,  New  Jersey 

AFSC  Scientific/Technical  Liaison 
U.  S.  Army  Electronics  Research  ^ 
Fort  Monmouth,  New  Jersey 

Commanding  Officer 

U.  S.  Army  Electronics  Research  § 

ATTN:  Logistics  Division 

Fort  Monmouth,  New  Jersey 

For  D.D.  Jacoby,  SELRA/NPE 

Commanding  Officer 
U.  S.  Army  Electronics  Research  $ 
ATTN:  Director  of  Research 
Fort  Monmouth,  New  Jersey 

Commanding  Officer 
U.  S.  Army  Electronics  Research  & 
ATTN:  Technical  Documents  Center 
Fort  Monmouth,  New  Jersey 

Commanding  Officer 

U.  S.  Army  Electronics  Research  S 

ATTN:  SELRA/NPE 

Fort  Monmouth,  New  Jersey 

Commanding  Officer 
U.  S.  Army  Electronics  Research  § 
ATTN:  Technical  Information  Divis 
Fort  Monmouth,  New  Jersey 
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Commanding  Officer 

U.  S.  Army  Electronics  Research  §  Development  Laboratory 
ATTN:  J.  Benson,  SELRA/XC 
Fort  Monmouth,  New  Jersey 
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Commanding  Officer 

U.  S.  Army  Electronics  Research  &  Development  Laboratory 

ATTN:  SELRA  /X 

Fort  Monmouth,  New  Jersey 

Ramo-Wooldridge  Corporation 
Los  Angeles,  California 
ATTN:  Information  Systems  Dept. 

Carnegie  Institute  of  Technology 
Schenley  Park 
Pittsburgh,  Pennsylvania 
ATTN:  Dr.  Alan  Perils 
Dr.  Allen  Newell 

R.C.A.  Data  Systems  Division 
4922  Fairmont  Avenue 
Bethesda  14,  Maryland 
ATTN:  Dr.  Jack  Minker 

Commanding  Officer 

U.  S.  Army  Signal  Corps  School 

Officers  Dept. 

Fort  Monmouth,  New  Jersey 
ATTN:  ADPS  Committee 

Commanding  Officer 
U.  S.  Army  Signal  Center  S  School 
Dept.  Specialist  Training 
Fort  Monmouth,  N,  J. 

ATTN:  Information  Processing  Branch 

Commanding  Officer 

U.  S.  Army  Signal  Engineering  Agency 
Washington  25,  D.  C. 

ATTN:  Computer  Systems  Division 

Lincoln  Laboratories 
Lexington  72,  Massachusetts 
ATTN:  Army  Liaison  Officer 

National  Bureau  of  Standards 
Washington  25,  D.  C. 

ATTN:  Mrs.  Ida  Rhodes 

Mechanical  Translation  Project 
ATTN:  Mr.  Russell  Kirsch 

Data  Processing  Systems  Division 

National  Science  Foundation 
Washington  25,  D.  C; 

ATTN:  Mr.  Richard  See 

Documentation  Research  Program 


Director*  National  Security  Agency 
Fort  George  G.  Meade,  Maryland 
ATTN:  C-3141  (Rm  2C087) 

Librarian  2 

Professor  E,  deGrolier 
Institute  National  Des  Techniques 
De  La  Documentation 

Paris,  France  1 

International  Electric  Corporation 
Box  285 

Attn:  J.  Harlow 

Paramus,  New  Jersey  1 
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